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, Abstract 



The design and optimization of quantum circuits is central to quantum computation. This paper 
presents new algorithms for compiling arbitrary 2™ x 2™ unitary matrices into efficient circuits of (n — 1)- 
controlled single-qubit and (n— l)-controlled-NOT gates. We first present a general algebraic optimization 
technique, which we call the Palindrome Transform, that can be used to minimize the number of self- 
inverting gates in quantum circuits consisting of concatenations of palindromic subcircuits. For a fixed 
column ordering of two-level decomposition, we then give an enumerative algorithm for minimal (n — 1)- 
controlled-NOT circuit construction, which we call the Palindromic Optimization Algorithm. Our work 
dramatically reduces the number of gates generated by the conventional two-level decomposition method 
for constructing quantum circuits of (n — l)-controlled single-qubit and (n — f)-controlled-NOT gates. 

Qi 1 Introduction 



The recent discovery of algorithms for prime factorization, discrete logarithms and other important problems 
[10, 16] that are more efficient on quantum computers than classical computers has escalated interest in 
quantum computing. However, physical limitations of current quantum technologies, such as coherence 
time and the number of available qubits, prevent the usage of quantum algorithms in any computationally 
significant setting. It is important, therefore, for any implementation of a quantum algorithm to make 
efficient use of the underlying quantum computing resources. 

No matter what technology will ultimately be used to implement quantum computers, the quantum 
circuit is most likely to remain the primary model for quantum computation [8, 13, 17]. It allows us 
to represent an algorithm to be implemented by any quantum computer as a composition of quantum 
gates. Although it is analogous to a classical logic circuit, a quantum circuit requires novel compilation 
and optimization algorithms since the criteria for efficient quantum computation are radically different from 
classical computation. It is particularly important to reduce the size of quantum circuits in the early phases 
of compilation since the later phases may increase circuit sizes dramatically for each additional gate in the 
initial circuit representation [2, 4, 11, 15]. Ideally we would like to achieve the best circuit for a given class 
of gates and a given technology taking into account all relevant factors such as size, noise, decoherence time, 
and so forth. A general-purpose quantum compiler will require both technology-independent and technology- 
dependent optimization techniques to achieve these efficiency goals. Until a fully scalable quantum computer 
technology emerges, we will restrict ourselves to machine-independent techniques. 

* aho@cs . Columbia, edu 
t kmsvoreOcs . Columbia, edu 
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In this paper, we focus on the design and optimization of quantum circuits consisting of controlled 
singlc-qubit gates for arbitrary 2" x 2™ unitary matrices. In particular, we focus on the reduction of (n — 1)- 
controlled-NOT gates in such circuits. To achieve this reduction, we introduce a general algebraic gate- 
minimization technique, which we call the Palindrome Transform. We then present an efficient iterative 
method, the Palindromic Optimization Algorithm, for decomposing a quantum circuit into matrices acting 
nontrivially on two or fewer vector components (two- level matrices). These algorithms are useful in the 
first phase of any general procedure for decomposing a quantum computation into an efficient quantum 
circuit. Ultimately we would like to produce efficient quantum circuits for different quantum technologies 
from high-level specifications of quantum computations. 



2 The Quantum Circuit Model 

We use the standard Dirac notation for quantum states, where a quantum state tp is written in ket form 
as A quantum bit, or qubit has state |0), state 1), or a linear combination of these states, written as 
\ip) = a\0) + (3\1), where a and (3 are complex numbers and \a\ 2 + \f3\ 2 = 1. The state space of n qubits, 
which lie in a 2"-dimcnsional complex Hilbert space, can be represented as a tensor product of the state 
space of each single qubit 

C 2 ® C 2 <g> . . . <g> C 2 = (C 2 )®" = C 2 ™ (1) 
and a state can be described by the vector 

|V} = ]T a x \x) (2) 

^£{0,1}" 

where the computational basis states are of the form |x n _ 1 . . . xix ) and the probability of measuring state 
\x), where x = . . . xix , is \a x \ 2 . 

We can model quantum computation using the quantum circuit model developed by Deutsch [8] and Yao 
[17]. The quantum circuit model consists of qubits, quantum wires, and quantum gates, where quantum wires 
provide communication between the sequential quantum gates by transporting output from one computation 
to serve as input to another. To identify the matrix elements of particular quantum gates, we order our 
states lexicographically. In our circuit diagrams, time increases from left to right, but the order of operators 
in a matrix sequence is applied to the state from right to left. 

In the quantum circuit model, a quantum gate on n qubits is a 2™ x 2™ unitary matrix U. A composition 
of quantum gates Gk ■ ■ ■ G\ is called a quantum circuit C, where the product of Gk ■ ■ ■ G\ represents the 
unitary operator computed by C. Two quantum circuits are equivalent if the composition of their respective 
gates represents the same unitary matrix. That is, if circuit C\ represents the matrix U\ and C 2 represents 
U 2 , and if U\ = U 2 , then C\ is equivalent to C 2 . 

A set of quantum gates is exactly universal if it can represent any unitary operation exactly by a com- 
position of its gates; a set is approximately universal if it can approximate any unitary operation to an 
arbitrary accuracy by a composition of its gates [7, 12]. Since there are noncountably many operations, 
exact universality requires an infinite generating set of quantum gates. However, approximate universality 
can be achieved by certain discrete sets of quantum gates. In this paper, we consider exact universality using 
the universal set of (n — l)-controlled singlc-qubit and (n — l)-controlled-NOT gates [7]. 

We use the following standard gates in our quantum circuits. The single-qubit Pauli-X operator 



X 



1 

1 



(3) 



is similar to the classical NOT operation and takes the state \x) — > |1 — x). There also exist operations on 
multiple qubits, such as the ability to conditionally apply a single-qubit gate. Control gates perform the 
target operation S only if the control qubits are set appropriately. The (n — l)-controlled gate, written as 
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A„-i(5), denotes n— 1 qubits controlling the application of the operator S to the target qubit. Throughout 
this paper, S represents a single-qubit gate. The controlled operation A n _i(S) is defined by 



A n _i(S)|ar n _i . . .xix )\ip) = \x n -i . ..x 1 x )S :L 



-iA...AiiAx 



(4) 



where x n -\ A . . . A x\ A x n in the exponent of S denotes the Boolean product of the bits x n -i, . . . , x\, xq. If 
the product of these bits is 0, then the operator is not applied. 

The Ai(X) gate is known as the controlled-NOT gate (CNOT) and performs the operation \x,y) — > 
\x,x © y), where © denotes the logical exclusive-or operation. Henceforth, we will refer to A\(X) as the 
CNOT gate. In matrix form, the CNOT gate is 



CNOT = 



10 
10 
1 
10 



(5) 



In this paper, we focus on decomposition techniques using two-level unitary matrices, where a two-level 
unitary matrix acts nontrivially on two or fewer vector components. Figure 1 shows a two-level matrix M. 
The row c contains 0's except for the two complex numbers a and (3 shown. Likewise the row r contains 
0's except for the two complex numbers 7 and <5. The rest of the matrix has l's on the diagonal and 0's 
elsewhere. M acts nontrivially on the space spanned by the row c and the row r. We define M to be the 
2x2 unitary submatrix consisting of a, j3 7 7 and 8 shown in Figure 2. We call this matrix the component 
matrix of M. Clearly, M is a unitary operator that acts on a single qubit. When necessary, we will indicate 
the vector components c and r on which M nontrivially acts by writing M c>r and M c ,r- 



M r .r = 



10 
10 







Q 



7 
















••• 1 
Figure 1: A generic two- level matrix M. 



M c .r = 



a (3 
7 S 



Figure 2: The component matrix M of M. 



3 A Framework for Quantum Circuit Compilation 

We now describe the first phase of our quantum circuit compilation process that generates for an arbitrary 
unitary matrix U an exact quantum circuit consisting of (n — l)-controlled single-qubit gates and (n — 1)- 
controlled-NOT gates [13, 14]. The compilation steps of this phase are shown in Figure 3. 
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Figure 3: The compilation steps of exact quantum circuit generation. 



This first phase, called two-level decomposition, takes as input a 2" x 2" unitary matrix U and an 
ordering of decomposition and outputs a sequence of two-level matrices V\ . . . Vk such that V\ . . . Vk = U, 
where k < 2"~ 1 (2™ — 1). This output is then converted into an optimized circuit, G m . . . G\, of A„_i(S') and 
A„_i(X) gates. Using standard techniques, the circuit of controlled operations can be further decomposed 
into a circuit composed of gates drawn from some universal set of basis gates [2]. One common exactly 
universal set is the set of single-qubit and CNOT gates [2]. Our framework here builds on and refines the 
conventional ordering and two-level decomposition method described in [2, 13, 14]. 

In this paper, we improve the first phase by finding an optimal ordering of decomposition for the two- 
level decomposition phase to minimize the number of A n _i(X) gates generated for the circuit G m . . . G\ 
corresponding to U. The remaining sections of this paper are organized as follows. In Section 4, we describe 
the conventional ordering and two-level decomposition algorithm used in the first step. In Section 5, we 
describe the second step that constructs a circuit of controlled single-qubit gates from the sequence of two- 
level matrices. In Section 6, we describe the Palindrome Transform that characterizes the optimal ways to 
order subcircuits to maximize the amount of cancellation of self-inverting gates. In Section 7, we introduce 
our Palindromic Optimization Algorithm (POA) that dramatically improves upon the conventional ordering 
used in the two-level decomposition algorithm of the first phase. In Sections 8 and 9 we derive equations for 
the number of generated gates and compare the sizes of optimized and unoptimized circuits. 



4 Two-Level Decomposition 

We now describe the first phase of our quantum circuit compiler. This phase, called two-level decomposition, 
takes as input an arbitrary 2™ x 2" unitary matrix U and produces as output a composition of two-level 
matrices V\ . . . Vk such that the product of V\ . . . Vk equals U. Phase I as described in this section uses 
the conventional ordering for two-level decomposition. In Section 7, we give a method for computing an 
improved ordering that dramatically reduces the size of the generated circuit. 

We define the order of two-level decomposition as the sequence of vector component pairs that are non- 
trivially acted on by the two- level matrices in the decomposition V\ ... Vk- We will associate an ordering 
pair (r, c) with a two- level matrix Vj to identify the four complex numbers Vj [c, c] , Vj [c, r] , Vj [r, c] , Vj [r, r] in 
the component matrix Vj . The sequence of ordering pairs defines the order of the two- level decomposition. 
To avoid repetition in a two-level decomposition, we only allow pairs (r, c) where r > c. Throughout this 
paper, the first number of an ordering pair represents a row and the second a column in a matrix. 

In all our sequences of ordering pairs, we begin with the pairs for column followed by those for 1, 
followed by those for column 2, and so on up to column 2™ — 2. We call this a fixed-column ordering. In the 
conventional algorithm for two- level decomposition, the ordering has the pairs (c+1, c), (c+2, c), . . . , (2™ — 1, c) 
for column c followed by the pairs (c + 2,c+ 1), (c + 3,c+ 1), . . . , (2" — l,c+ 1) for column c+ 1, and so on. 

We will use a triangular array order n to store the ordering pairs. The entries in rows 1, 2, . . . , 2™ — 1 — c of 
column c in order n represent the ordering pairs (order n [l, c], c), (order n [2, c], c), . . . , (order n [2 n — 1 — c, c], c). 
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For n = 2, the order array orderi for the conventional algorithm is 

" " 

12 3 

2 3 

3 

Note that row and column n — 1 are not used in the two-level decomposition algorithm since they violate 
the condition that the row value must be greater than the column value, but they are included for notational 
convenience. 

Algorithm 1: Two-Level Decomposition 

Input: A 2™ x 2" unitary matrix U and a 2™ x 2™ array order n dictating the order of the two- level decom- 
position. 

Output: A sequence of two-level matrices 14 ... 14 such that 14 ... 14 = U. 
Method: 

procedure TwoLevelDecompose(U, order n ) { 
M = U; 

3 = i; 

f or c = to 2" - 2 do { 

for r = order n [l, c] to order n [2 n — c — 1, c] do { 
if c equals 2™ — 2 then { 

M j = 7; 

Mj[c,c] = M[c,c]*; 
Mj[c,r] =M[r,c]*; 
Mj[r,c] = M[c,r]*; 
Mj[r,r] =M[r,r]*; 

} 

else if M[r, c] equals then { 
Mj = I; 

if r equals order n [2 n — c — 1, c] then 

M 3 [c,c] =M[c,c]*; 

} 

else { 

Mj = I; 

M^c, c] - M[c, c]7V 1MM| 2 + |MM| 2 ; 

m,m = MMV V|AiM| 2 + |MMP ; 

M, [r, c] = M[r, c]/ y/\ M[c, c]| 2 + |M[r, c]| 2 ; 
M,[r,r] = -M[c,c]/V|M[c,c]| 2 + |M[r,c]| 2 ; 

} 

output Vj; 

M = Mj * M; 

i=i + i; 

} 

} 

} 

To perform a conventional two-level decomposition on U, we call the procedure TwoLevelDecompose 
on U and the conventional ordering array order n using Algorithm 1. With the conventional ordering array 
as input, the algorithm applies a transformation Mi to U to set the matrix entry MiJ7[l,0] to 0. It then 
applies a transformation M 2 to M\U to set M 2 MiU[2, 0] to 0. It continues in this fashion until column has 
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a 1 in the top entry and O's everywhere else. This process is sometimes called a quantum Givens operation 
[6]. It then iteratively applies this process to the 2" — 1 x 2" — 1 unitary submatrix in the lower right-hand 
corner of M 2 »^iM 2 n-2 • ■ • M\U , ultimately decomposing U into a product of two-level unitary matrices. 

Algorithm 1 produces as output a sequence of two- level unitary matrices V\ . . . Vk, where Vj — Mj, the 
adjoint of Mj. We can easily verify that V\ ... Vk = U, and that k < 2"~ 1 (2™ — 1). We denote the complex 
conjugate of a complex number ( = a + ib as (* = a — ib. 

5 Controlled Single-Qubit Gate Circuit Construction 

After performing the two-level decomposition on U, we need to construct a circuit from the sequence V\ . . .Vk 
of two-level matrices using A n _i(5) and A„_i(X) gates. To compute each Vj, the circuit must perform a 
sequence of state changes in order to bring together the two vector components that are nontrivially acted 
on by Vj . The algorithm uses Gray codes to transform each Vj in V\ . . . Vk into a circuit of controlled singlc- 
qubit gates. We can determine the state changes needed for Vj by constructing a Gray code between the two 
computational basis states |c) and |r) of Vj. 

Let us define GrayCode(c,r) between state \c) and state |r) to be a minimal sequence of binary numbers 
9i, 92, ■ ■ ■ 1 9m in which gi — c„_!C„_ 2 • ■ • c o is the binary expansion of c, g, m = r„_ir„_2 ■ • • r is the binary 
expansion of r, and two adjacent binary expansions gj and gj + \ differ by only one bit for 1 < j < m — 1. 
That is, only one bit flip occurs between two binary numbers in the sequence. We call the order of bit flips 
between the binary expansion of c and the binary expansion of r in the Gray code the Gray code ordering 
for c and r. Note that a bit flip may not be required for every bit position. Also, there are at most n + 1 
binary numbers in a Gray code between any pair of states. From the Gray code sequence, we determine the 
corresponding quantum circuit. 

To construct a circuit from the Gray code 91,92, ■■■ ,9m for the two-level unitary matrix Vj, we create a 
A„_i(X) gate to transform state \gj) into \gj+i), for 1 < j < m — 2. Each gate performs a controlled bit flip 
on the differing qubit, conditional that all other qubits are the same as in states \gj) and Iffj+i). 

After the bit-flipping operations, we create a A„_i(V^-) gate to transform state \g m -i) into \g m ) with the 
differing qubit as target and conditional on all other qubits being the same as in state \g m )- We then create 
a sequence of A„_i(A) gates to undo the initial sequence of bit-flipping operations by repeating them in 
reverse order. 

Algorithm 2 presents the details of this circuit-construction process. It constructs a sequence of controlled 
single-qubit gates for each two-level matrix Vj in V\ . . . Vk ■ Note that the output of Algorithm 2 is a sequence 
of palindromic subcircuits, subcircuits that read the same forwards as backwards. We will discuss the 
optimization of palindromic circuits in detail in the next section. 

As an example, Table 1 contains a Gray code between basis states |000) and | 111) . Figure 4 contains the 
corresponding quantum circuit of five gates, where © represents the Pauli-A operator, o represents a control 
on 0, and • represents a control on 1. 

Algorithm 2: Controlled (n — 1)-Singlc-Qubit Gate Circuit Construction 



State 


Gray Code 


|000) 


000 




001 




011 


1111) 


111 



Table 1: The Gray code between state |000) and state 1 1 1 1 ) . 
Input: A sequence of two-level unitary matrices V\ . . . Vk- 
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Figure 4: The circuit for the two-level matrix Vj that nontrivially acts on states |000) and 1 1 1 1) . 



Output: A circuit composed of A n _i(Vj) and A„_i(X) gates, for each Vj, 1 < j < k, that computes the 

product V\ . . . Vfe . 

Method: 

procedure ConstructCircuit{V\ . . . Vk) { 
for j = 1 to k do { 

let |c) and \r) be the basis states for Vj ; 
let g 1 ,g 2 ,..., g m = GrayCode(c, r) ; 
for k — 1 to m — 2 do 

output ControlGate{X 1 gj ,gj + \); 
output ControlGate(Vj,g m -i,g m ); 
for fc = m — 2 to 1 do 

output ControlGate(X 1 gj+i, gj) ; 

} 

} 

procedure GrayCode(c, r) { 

let g = g n -ig n ~2 ■ ■ ■ go De the binary expansion of c; 
let h = h n -\h n -2 ■ ■ ■ ho be the binary expansion of r; 
output g; 
while g =/= h do { 

let gk be the rightmost bit in g that is different from 
the corresponding bit in h; 

let g = g n -i...g k+ ig- k g k --L...g ; 

comment g k is the complement of g k ; 

output g; 

} 

} 

procedure ControlGate(S, gj, gj+i) { 

output the (n — l)-controlled single-qubit gate A„_i(S') 
targeting the bit differing between gj and gj+\ 
and conditional on the other qubits being the same 
as in gj\ 

} 

6 The Palindrome Transform 

In this section we present a general algorithmic optimization technique, which we call the Palindrome Trans- 
form, that can be used to minimize the number of self-inverting gates in quantum circuits composed of 
concatenated palindromic subcircuits. The minimization arises from determining an optimal ordering for 
concatenating the palindromic subcircuits that induces the maximal amount of cancellation due to the juxta- 
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position of self-inverting gates. We then characterize the orderings of palindromic subcircuits that maximize 
the total amount of cancellation. 

We call a gate A self inverting if AA = I, that is, if A is its own inverse. If we generate a sequence of 
self-inverting gates of the form 

A X A 2 . . . A m - X A m A m A m _ x . . . A 2 A X 

then we can eliminate this sequence by replacing it with the empty sequence. We call such a sequence self 
annihilating. 

A number of quantum-circuit-generation algorithms produce subcircuits consisting of sequences of gates 
in which a prefix and suffix of each subcircuit forms a palindrome of self-inverting gates. That is, a subcircuit 
is of the form 

A 1 A 2 ...A k f3A k ...A 2 A 1 (6) 

for m > 0, where each Aj is a self-inverting gate and [3 is a unique gate that is not necessarily self inverting. 
For the purposes of this paper, we assume (3 is a controlled single-qubit gate A„_i(5), where S is a component 
matrix. We call a sequence of the form (6) a palindromic subcircuit 1 . 

If a is a string of symbols A\A 2 . . . Ah, then we use a R to denote A k . . . A 2 Ai, the reversal of a. Define 
the overlap between two palindromic subcircuits a\A\a R and a 2 A 2 a R to be the longest reversed suffix ^ R 
of af , or equivalently the longest prefix 7 of a 2 , such that 7^7 is a self-annihilating sequence. 

For example, if we concatenate the two palindromic subcircuits ABCA\CBA and ABA 2 BA, we get the 
circuit ABCAiCBAABA 2 BA = ABCA\CA 2 BA. Here, AB is the overlap between these two palindromic 
subcircuits and BAAB is a self-annihilating sequence. 

If we have a set PS of palindromic subcircuits, then we can use the following algorithm to find an optimal 
ordering of all the subcircuits in PS that maximizes the sum of the overlaps between successive subcircuits 
in any composition of the subcircuits. We call such an ordering a maximal overlap sequence for PS. 

The algorithm uses a data structure called a trie [1], sometimes called a radix tree [5], to store the 
prefix otjAj of each palindromic subcircuit ctjAja R . The trie is an ordered labeled tree in which there is 
a path from the root to a leaf that spells out the string ajAj. The root is labeled by the empty string 
and each non-root node is labeled by a gate. If there is another string a k A k that has a common prefix 
7 with ajAj, then the paths for otjAj and a k A k in the trie each share the prefix 7. For notational con- 
venience, we will just use the middle Aj to represent a palindromic subcircuit in a maximal overlap sequence. 

Algorithm 3: The Palindrome Transform 
Input: A set of m palindromic subcircuits 

PS = {aiAiaf , a 2 A 2 a R , . . . , a m A m a R } 

Output: An ordering Aj 1 , Aj 2 , . . . , Aj m for the concatenation of these palindromic subcircuits such that 

a ji An ot j 1 ctj 2 Aj 2 ctj 2 . . . otj m Aj m <Xj m 

maximizes 

m— 1 

length{overlap{af kl a Jk+l )) 

fe=i 

where length^) is the number of gates in the sequence 7. 
Method: 

lr The results in this section also apply to subcircuits of the form A\ . . . A^ftA^ 1 . . . A^ 1 , but these do not arise in the context 
of two-level decomposition. 
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procedure PalindromeTransform(PS,m) { 
initialize a trie T; 
for j = 1 to m do 

enter(ctjAj, T); 
dfsPrint(T); 

} 

procedure enter(string,T) { 
let string = A\A<l . . . 
start at root of T; 

follow the longest path A\A^ . . . A p in T that 

spells out a prefix of string ending at node x; 
create a new path starting at node x that spells out 

A p+ \A p+ 2 ■ . . Ak\ 

} 

procedure dfsPrint(T) { 

visit the nodes of T in a depth-first-search order 

printing the label of each leaf when it is first encountered; 

} 

We call the trie produced by Algorithm 3 the palindrome trie. By entering the ctjAj's into the trie, 
we identify the maximal length common prefixes for all palindromic subcircuits. Note that we are using 
Aj to represent the palindromic subcircuit ajAja^. By grouping the labels of the leaves of the trie in a 
depth- first-search order [1, 5], we order the palindromic subcircuits to achieve the maximal possible total 
overlap of self-inverting gates between successive subcircuits. 

We can characterize the ordcrings of the leaves of the palindrome trie that are maximal overlap sequences. 
Let T be a trie whose root node has p subtries with exactly one child labeled Ai , . . . , A p , p > 0, and q subtries 
Ti, . . . ,T q , q > 0, where each subtrie Tk has more than one child, as shown in Figure 5. We assume that 
p + q > and that the p + q subtries can appear in any order. 



Figure 5: A generic trie. 

Let mos(T) be the set of all sequences of leaf-labels of T that are characterized by the recurrence 

mos(T) = permutation(Ai , . . . , A p , mos(T\), . . . ,mos{T q )) 

where permutation(xi, . . . , x m ) is the set of all sequences that are permutations of the sequences x\, . . . , x m . 
We shall show that any sequence in mos{T) is a maximal overlap sequence and conversely every maximal 
overlap sequence is in mos(T). Listing the leaves of the trie in a depth- first-search order is one efficient way 
to produce such a sequence. 

Theorem 1 Let T be a palindrome trie for a set PS of palindromic subcircuits. A sequence of palindromic 
subcircuits from PS is a maximal overlap sequence if and only if it is in mos{T). 
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Proof. To show that every sequence in mos(T) is a maximal overlap sequence we use structural induction 
on T. The sequences in mos(T) recursively keep the leaves of the subtries of T contiguous. Single-leaf 
subtries of T correspond to palindromic subcircuits that cannot participate in any prefix sharing. If Tj is 
a subtrie of T with k leaves, where k > 1, then Tj adds 2(fc — 1) to the number of cancelling contiguous 
self-inverting gates by sharing the gate represented by the branch from the root of T to the root of the subtrie 
Tj. Assuming every sequence in mos(Tj) is a maximal overlap sequence, then every sequence in mos{T) 
attains the maximal amount of sharing and thus maximizes the sum of the lengths of the overlaps between 
successive palindromic subcircuits. Thus every sequence in mos(T) is a maximal overlap sequence. 

Conversely, it is easy to show that every maximal overlap sequence for PS corresponds to some traversal 
of the palindrome trie for PS represented in mos(T). □ 

Corollary 1 The procedure PalindromeTransform(PS 1 m) produces an ordering for the m circuits in PS 
that maximizes the total number of cancelling self-inverting gates. 

Proof. The depth-first-search ordering of the leaves of the palindrome trie for PS has the mos property. □ 

Corollary 2 The number of gates in the circuit produced by the palindrome transform ordering after can- 
celling all self-inverting gates is 

(number of leaves in trie) + 2(number of interior nodes in trie) 

Proof. Note that a path ay from the root of the palindrome trie to a leaf labeled by Aj followed by the 
reverse path af defines a palindromic subcircuit djAjOtf. One gate is generated for each leaf. Each incoming 
branch to an interior node generates one gate before the leaf to perform an operation and one gate after the 
leaf to invert the effect of that operation. □ 

The palindrome transform assumes the palindromic subcircuits can be concatenated in any order. If 
we treat the middle gate of each palindromic subcircuit as a generic gate, then we can use the palindrome 
transform to generate for an arbitrary unitary matrix U a sequence of controlled single-qubit gates in which 
the maximum amount of cancelling of self-inverting gates takes place, assuming a fixed column order of 
two-level decomposition. 

To do this, we first construct palindromic subcircuits with a generic middle gate from the Gray codes for 
the conventional ordering of two-level decomposition for U . From these palindromic subcircuits, we use the 
palindrome transform to find an mos ordering of the generic gates. Using this mos ordering, we then use 
Algorithms 1 and 2 of the previous section to construct the quantum circuit C of A„_i(Vj) and A„_i(A) 
gates such that C computes U. The circuit C will have the maximal amount of cancellation of A„_i(X) 
gates due to the juxtaposition of self-annihilating sequences. Note that any mos ordering produced in this 
fashion generates a circuit that computes U. 

In the next section, we will give a direct enumerative method of constructing a circuit of this nature 
without having to construct the palindrome trie. 

7 Palindromic Optimization Algorithm 

We now describe our Palindromic Optimization Algorithm (POA). It takes as input a 2™ x 2™ unitary matrix 
U and produces as output a circuit G m . . . G\ of controlled single-qubit gates that computes U minimizing 
the number of A„_i(X) gates in the generated circuit. 

POA performs a two-level decomposition on U, assuming a fixed-column order 0, 1, . . . , 2" — 2, where 
the columns of the matrix are labeled to 2" — 1 [13]. It uses a specially computed array n to direct the 
two- level decomposition in order to minimize the number of A„_i(A) gates in the generated circuit. The 
order of two-level decomposition directs the generation of a sequence V\ . . . Vk of two-level matrices such that 
V 1 ...V k = U. 
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Figure 6: A subsequence of the unoptimized circuit for an arbitrary 2 3 x 2 3 unitary matrix using the 
conventional ordering. 



POA uses Algorithm 2 to generate the output circuit from V\ . . . Vk- It uses the Gray code algorithm 
described in Section 5 to determine the sequences of A„_i(A) gates to perform the state changes to bring 
together the two nontrivial vector components for each controlled A„_i(Vj) gate. We require the Gray code 
ordering to be 2°, 2 , . . . , 2" -1 , where n is the number of qubits, to achieve the minimal number of A„_i(A) 
gates. If a different Gray code order is used, the minimal number of A„_i(A) gates may not be achieved 
for all n. For the stated setting, POA maximizes the overlap of A„_i(A) gates over all two-level matrix 
decompositions, thus minimizing the number of A„_i(A) gates in the generated circuit. 

Algorithm 4: Palindromic Optimization Algorithm 

Input: A 2" x 2" unitary matrix U and n, the number of qubits. 

Output: A circuit of (n — l)-controlled single-qubit gates that computes U. 

Method: 

procedure POA(U) { 

array n — Produce Array(n); 

(Vi . . . Vk) = TwoLevelDecompose(array ni U); 

(G m . . . G\) = Construct Circuit(Vi . . . Vk); 

} 



procedure ProduceArray(n) { 



array2 [0..3, 0..3] 



for m = 3 to n do { 

k = 2 m ~ 1 ; 

for c = to 2 1 "- 1 - 1 do { 
array m [k, 2c] = 2c + 1 ; 
for r = 1 to 2" 1 - 1 - c - 1 do { 

array m [r, 2c] = 2array m -\[r 1 c] ; 

array m [r + k, 2c] = 2array m -i[r, c] - 

array m [r, 2c + 1] = 2array m - 1 [r, c] ; 

array m [r + k — 1, 2c + 1] = 2array m 



1; 

-i[r,c] + 1; 



k-l; 



} 

return array m ; 

} 

We now prove the optimality of POA assuming a fixed-column ordering 0, 1, . . . , 2" — 2 for a two-level 
decomposition, a right-to-left bit ordering 2°, 2 1 , . . . , 2 n ~ 1 for the Gray code order, and ordering pairs (r, c) 
in which r > c and the sequence of state changes must occur from c to r. 
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Figure 7: A subsequence of the optimized circuit for a 2 3 x 2 3 unitary matrix using POA. 



Let PS(c, r) be the palindromic subcircuit generated for the Gray code sequence returned by the proce- 
dure GrayC 'ode(c, r) in Algorithm 2. First, we examine the intercolumn ordering of the entries in array n 
and the row ordering within a given column necessary to achieve a minimal A„_i(A) circuit for U. Then 
we prove that the ordering of the entries from row 1 to row 2™ — c — 1 in in each column c in array n is a 
maximal overlap sequence for < c < 2™ — 2. 

Lemma 1 The maximum possible overlap o/A„_i(A) gates between the last palindromic subcircuit generated 
for column c and the first palindromic subcircuit generated for column c+1 is 1, for < c < 2™ — 2. Further, 
an overlap of 1 is achieved between the circuit PS(c,ri ast ) followed by the circuit PS(c+ l,rf irst ), where 
ri as t is the last entry in column c and rf irst is the first entry in column c+1, only when c is even, r/ ast is 
odd, and r fi rs t is even. 

Proof. For n qubits, we have a fixed column ordering 0, 1, 2, . . . , 2™ — 2. Let us first consider the case where 
column c is even. 

We would like PS(c,ri ast ) and PS(c + l,rf irst ) to overlap and thus share one or more A„_i(A) gates. 
Since c is even and c + 1 is odd, the 2° bit of the binary expansion of c is and the 2° bit of c + 1 is 1. For an 
overlap to occur, the 2° bit of ri ast must be 1 and the 2° bit of rf irst must be 0. Thus, an overlap between 
subcircuits PS{c,ri aa t) and PS(c+ l,r/j rs t) occurs only when ri ast is odd and rfi rst is even. Furthermore, 
the maximum overlap is 1 since after flipping the 2° bit of r; ast to 1, it remains 1. Similarly, the 2° bit of 
rfirst remains 0. Thus only one overlap can occur. 

Now consider the case where column c is odd. Using the same reasoning as above, an overlap can occur 
between PS(c,ri ast ) and PS(c+ l,rf irst ) only when r; ast is even and rf irst is odd. But, if c is odd, there 
must be at least one 1 in the binary expansion of c + 1 that is not present in c. Since the first bit flip is on 
bit 2°, there cannot be an overlap due to this differing 1 and thus the maximum overlap is 0. □ 



Lemma 2 Within a column c, an overlap can occur between the subcircuits generated for two adjacent rows 
only if the entries for both rows are even or both are odd. 

Proof. First consider the case where column c is even and n and r 2 are the entries for two adjacent rows 
in column c. We have the following combinations: 

i. n is odd, r 2 is even: Since only GrayC 'ode{c,r\) requires a 2° bit flip, PS{c,r\) and PS(c,r 2 ) cannot 
have an overlap. 

ii. ri,r 2 are both odd: Since both pairs require a 2° bit flip, there exists at least one overlap. 

iii. n is even, r 2 is odd: There cannot be an overlap. 

iv. n, r 2 are both even: Since both pairs have a in bit 2°, there may be an overlap. 

Similarly, if c is an odd column, then an overlap can occur only when n and r 2 are both even or both 
odd. □ 

We now prove that POA generates maximal overlap sequences. Let be the sequence 

array m [l, c],array m [2, c], . . . , array m [2 m - c - 1, c] 

of row entries created by the procedure Produce Array for column c of array m . 

Lemma 3 R c m is a maximal overlap sequence, for < c < 2 m — 2 and 3 < m < n. 
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Proof. We prove by induction on m, that R c m is a maximal overlap sequence. Let the base case be m = 3. 
By inspection of the 2 3 x 2 3 array arrays, the sequences i?§ for columns c = 0, 1, . . . , 6 are maximal overlap 
sequences. 

For the inductive step, assume R c m _i is a maximal overlap sequence. Column c of array m -\ generates 
columns 2c and 2c + 1 of array m as follows: 

R% = 2R c m _ 1 ,2c + 1,2^ + 1 (7) 

^m + 1 = 2 -Rm-l> 2 ^m-l + 1 ( 8 ) 

where 

2i?, c „_ 1 = 2array m _i[l, c], . . . , 2array m _i[2 m - 1 - c - 1, c] 

and 

2iC-i + 1 = 2array m -i[l,c] + 1, . . . , 2array m _i[2 m - 1 - c - l,c] + 1. 

Let us now examine how the palindromic subcircuits generated by the columns of array m are related to 
the subcircuits generated from array m -\. Let PS!^ be the sequence of palindromic subcircuits generated by 
Algorithm 2 for the row entries in ¥L c m in column c of array m . 

The GrayCode sequence GrayCode(2c, 2r) is equivalent to a left shift of the sequence GrayCode(c, r) with 
a entering in the 2° bit position in each binary expansion. Similarly, GrayCode(2c + 1, 2r+ 1) is equivalent 
to a left shift of GrayCode(c, r) with a 1 entering in the 2° bit position in each binary expansion. Both 
GrayCode(2c, 2r + 1) and GrayCode(2c + 1, 2r) require one additional binary expansion in addition to those 
in GrayCode(c, r) since an initial bit flip on bit 2° is now required. 

The sequence of palindromic subcircuits PS^ is constructed from the sequence of Gray codes generated 
by GrayCode(2c, j) for all j's in R^. Similarly, the sequence of palindromic subcircuits PS^ +1 is constructed 
from the sequence of Gray codes generated by GrayCode(2c + 1, j) for all j's in R^ +1 . 

We therefore see that the binary code expansions derived from the row entries in R^-i are uniformly 
shifted. Further, since R^ is the concatentation of 2R c m _ x with 2c+l, 2R^ n _ 1 + l, the concatenation does not 
generate any new overlaps since 2i? m _i consists of even entries, and the entry 2c + 1 and those in 2R^ n _ 1 + 1 
are all odd. Similarly for R^ +1 . Assuming R^-i was a maximal overlap sequence, we conclude R^ and 
Rm +1 are also each maximal overlap sequences. □ 



Theorem 2 For a fixed-column two-level decomposition of an arbitrary 2" x 2™ unitary matrix, the Palin- 
dromic Optimization Algorithm produces a circuit that achieves the maximal length of overlaps between suc- 
cessive palindromic subcircuits and thus minimizes the number of A„_i(X) gates generated in the quantum 
circuit of (n — \)-controlled single-qubit and (n — 1)- controlled- NOT gates. 

Proof. The proof follows from Lemmas 1-3. □ 



8 Gate Count Equations 

We now quantify the number of gates in the circuits generated by our algorithms. In all our equations 
n is the number of qubits. We first derive the equation for the number of gates produced by using the 
conventional two-level decomposition algorithm assuming no cancelling of self-inverting gates. We then give 
the gate count for conventional two-level decomposition with cancellation. Finally, we derive the equation 
that gives the number of gates in the optimized circuit resulting from performing two-level decomposition 
in the order specified by POA. 
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8.1 Conventional Circuit Size 



We will show that c„, the number of gates in the unoptimized circuit produced using the conventional order 
of two-level decomposition, is given by 

c n = (n- 1)2 2 "- 1 + 2"- 1 (9) 

We can determine the size of the circuit produced by the two-level decomposition algorithm for a 2™ x 2™ 
unitary matrix using the conventional ordering by taking the number of Gray codes of length j generated 
by Algorithm 2, given by 

'n 
3 

and multiplying this number by 2j — 1, the number of gates in the circuit generated for a Gray code of length 
j. Thus the number of gates in the conventional circuit for n qubits is given by 



x (2j - 1) 



n , 

E 2 "" lx ( 

n2 2n-l _ 2 2n-l + yn-\ 



(n - 1)2 2 "- 1 + T- 1 



8.2 Conventional Circuit Size with Cancelling 

The number of gates in the unoptimized circuit after cancelling adjacent A n _i(X) gates between palindromic 
subcircuits follows directly from Equation 9. From Lemmas 1 and 2, we conclude that only the inter-column 
overlaps allow for annihilation of gates using the conventional ordering array for order n . By Lemma 1, the 
number of gates that cancel is 2(2™~ 1 — 1), so the gate count equation is then 

cc„ = (n - 1)2 2 "- 1 - 2™- 1 + 2 (10) 



8.3 POA Circuit Size 

We will show that the number of gates poa n in the optimal circuit produced by the Palindromic Optimization 
Algorithm for an arbitrary 2™ x 2™ unitary matrix is 

7 1 fl 

poa n = (-)2 2 "- 1 - (7)2"- 1 + y (11) 

To derive Equation 11 for 2™ x 2" unitary matrices, we consider the ordering array n -\ and apply POA to 
determine array n and the corresponding number of gates for the circuit for n. From column c of array n -i 1 
POA determines columns 2c and 2c + 1 of array n . 

Consider the case of the even column 2c in array n . We note from the proof of Lemma 3 that the subtrie 
for this column is exactly the subtrie for column c in array n -i with two additional branches as given in 
Equation 7: one branch at one further depth containing a copy of the subtrie and a single leaf containing a 
single gate. This implies that the number of gates generated by column 2c in array n is twice the number of 
gates generated column c in array n -\ plus three, two for the additional branch and one for the additional 
leaf. 

Similarly, the odd column 2c + 1 in array n generates two times the number of gates generated for column 
c in array n -x plus two gates required for the additional branch as given in Equation 8. 
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Note that i? 2 ^ 1 is empty, so ~ 2 contains a single entry 2" — 1 and i? 2 _1 is empty. 

We can assemble these observations into a recursive formula to calculate the number of gates in the 
optimized circuit. Let T£ be the number of gates generated for the c th column of array ni < c < 2" — 2. 
We have 

T° = 2T0_! + 3 (12) 
Tl = 2T°_ X + 2 (13) 



7f- 4 = 2T,f?- 2 +3 (14) 

Tf - 3 - 2T„ 2 r/- 2 + 2 (15) 

For the calculation of the two final columns of array n from the final column of array n -\ we have 

if ~ 2 = 2T 2 :; 1 - 1 + 1 = 1 (16) 

If" 1 = 2T 2 :; 1 - 1 = (17) 

Let poa n be the total number of gates generated by POA using array n . Summing the gate counts for 
every column and recalling that the number of gates that cancel due to inter-column overlaps is 2(2"~ 1 — 1), 
poa n is then given by the recurrence 

poa n = 4(poa n _i + (2"" 1 - 2)) + 5(2 n ~ 1 - 1) + 1 - 2{2 n ~ 1 - 1) (18) 

Solving Equation 18 gives 

2 n -2 n-l n-2 

poa n =J2 2 '+J2 2 2j (2 n ~ j ~ 1) - Yl V ^ 19 ) 

j=n j=l j=l 



Simplifying this equation, we get 



7 1 n 

poa„ = -(2 2 "- 1 )-7(2"- 1 ) + -- 



9 Results 

The Palindromic Optimization Algorithm results in a dramatic reduction in circuit size over the conventional 
method. Table 2 lists circuit sizes for n = 2, ... ,7 qubits resulting from two-level decomposition using the 
ordering produced by POA, the conventional ordering, and the conventional ordering with no annihilation 
of self-inverting gates. 

When we use the conventional ordering [13] for two-level decomposition on a 2 3 x 2 3 unitary matrix, the 
resulting circuit contains 62 gates. Figure 6 shows the initial sequence of gates in this circuit. However, our 
palindromic optimization algorithm produces a circuit with 50 gates. Figure 7 shows the initial sequence of 
gates in this optimized circuit. 

The reduction increases linearly with the number of qubits. For example, when n — 7, our method reduces 
the number of gates from 49,090 to 18,670 over the conventional method, a more than 60% reduction. 



10 Conclusions 

In this paper we have presented a framework for compiling an arbitrary 2™ x 2™ unitary matrix into a quantum 
circuit of (n — l)-controlled single-qubit and (n — l)-controlled-NOT gates in which the initial phase of the 
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n 


Palindromic 


Conventional 


No canceling 


2 


8 


8 


10 


3 


50 


62 


68 


4 


246 


378 


392 


5 


1086 


2034 


2064 


6 


4558 


10210 


10272 


7 


18670 


49090 


49216 



Table 2: Number of (n — l)-controlled gates in an n-qubit circuit using our algorithm, the conventional 
ordering, and the conventional ordering without canceling palindromes. 



framework decomposes the matrix into a sequence of two-level matrices. We have shown that the order 
of two-level decomposition can have a dramatic impact on the size of the resulting quantum circuits and 
we have characterized those orders of two- level decomposition that, for a fixed-column ordering, minimize 
the number of (n — l)-controlled-NOT gates that get generated. We have also presented an enumerative 
Palindromic Optimization Algorithm that produces circuits with the minimal number of controllcd-NOT 
gates. This algorithm yields circuits that are significantly smaller than those produced by the conventional 
ordering for two-level decomposition. 
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